Lo mejor de dos idiomas - Cross-Lingual Linkage of Geotagged Wikipedia Articles

نویسنده

  • Dirk Ahlers
چکیده

Different language versions of Wikipedia contain articles referencing the same place. However, an article in one language does not necessarily mean it is available in another language as well and linked to. This paper examines geotagged articles describing places in Honduras in both the Spanish and the English language versions. It demonstrates that a method based on simple features can reliably identify article pairs describing the same semantic place concept and evaluates it against the existing interlinks as well as a manual assessment.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On finding cross-lingual article pairs

Finding a Wikipedia article in another language is often achievable with the in-built interlanguage links. We explore the possibility to automatically generate these links for geotagged articles as an application of entity resolution on an article level. It has the potential to improve Wikipedia, but also allows to use a well-curated ground truth for the merging algorithm. The resolution is bas...

متن کامل

Towards Cross-lingual Patent Wikification

This paper demonstrates the effectiveness of cross-lingual patent wikification, which links technical terms in a patent application document to their corresponding Wikipedia articles in different languages. The number of links increases definitely because different language versions of Wikipedia cover different sets of technical terms. We present an experiment of Japanese-to-English cross-lingu...

متن کامل

Untangling the Cross-Lingual Link Structure of Wikipedia

Wikipedia articles in different languages are connected by interwiki links that are increasingly being recognized as a valuable source of cross-lingual information. Unfortunately, large numbers of links are imprecise or simply wrong. In this paper, techniques to detect such problems are identified. We formalize their removal as an optimization task based on graph repair operations. We then pres...

متن کامل

A Comparison of Approaches for Measuring Cross-Lingual Similarity of Wikipedia Articles

Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Machine Translation and CrossLanguage Information Retrieval. Articles written in different languages on the same topic are often connected through inter-language-links. However, the extent to which these articles are similar is highly variable and this may impact on the use of Wikipedia as a compar...

متن کامل

Document Categorization using Multilingual Associative Networks based on Wikipedia

Associative networks are a connectionist language model with the ability to categorize large sets of documents. In this research we combine monolingual associative networks based on Wikipedia to create a larger, multilingual associative network, using the cross-lingual connections between Wikipedia articles. We prove that such multilingual associative networks perform better than monolingual as...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013